Controlled precision QUBO-based algorithm to compute eigenvectors of symmetric matrices

We describe an algorithm to compute the extremal eigenvalues and corresponding eigenvectors of a symmetric matrix which is based on solving a sequence of Quadratic Binary Optimization problems. This algorithm is robust across many different classes of symmetric matrices; It can compute the eigenvector/eigenvalue pair to essentially any arbitrary precision, and with minor modifications, can also solve the generalized eigenvalue problem. Performance is analyzed on small random matrices and selected larger matrices from practical applications.


Introduction
The problem of computing eigenvectors and eigenvalues to a desired precision has many applications in science and mathematics, including web page ranking [1], planar embeddings [2], and principal component analysis [3], among many others. The recent development of new computing paradigms has led to the development of annealing devices, which are specialized hardware designed to solve Quadratic Binary Optimization Problems (QUBOs). Such annealers include D-Wave's quantum annealers and Fujitsu's Digital Annealer. This has lead to a corresponding interest in reformulating computational tasks as QUBOs and solving them using these annealing devices. This strategy has been applied to several problems including graph partitioning [4], solving polynomial equations [5], and vertex coloring [6]. Here we compute eigenvectors of symmetric matrices by solving a sequence of QUBOs, which allow the eigenvectors and eigenvalues to be found to any desired precision. A mathematically similar approach to this problem is considered in [7,8], but accuracy is increased by increasing the size of the associated QUBO. In contrast, the proposed algorithm can compute eigenvectors to essentially arbitrary precision without increasing the size of the QUBOs, which can have as few as twice as many variables as the original eigenvalue problem. The trade-off for using small QUBOs is that more iterations are required. A similar approach is considered in Appendix C of [9], although here the effects of different parameters are more thoroughly studied, and the presentation gives a very general optimization framework. The performance data is collected using D-Wave's Ocean Simulated Annealing (SA) package. As mentioned, the state-of-art of QUBO eigensolver is proposed in [8]. In the aforementioned work, the only way of increasing precision is to increase the QUBO size by increasing the number of bits required to represent the real numbers. In the hereby proposed method we can compute the eigenpair to arbitrary any precision without increasing the size of the QUBO.
The paper is organized as follows. The relevant mathematical background for symmetric matrices and use of QUBO solvers as a descent method is explained in Section 2.1. The algorithm for computing the eigenvector/eigenvalue pair is given in Section 2.3. Experimental results with various parameters and matrices are presented in Section 3, followed by the conclusion in Section 4.

Mathematical background
Let A be a symmetric matrix. A well-known consequence of the spectral theorem is that the smallest eigenvalue λ and corresponding eigenvector v are global minima for the Rayleigh quo- The proposed algorithm uses a QUBO formulation of the problem to both obtain a good initial guess for the global minimum, and to implement an iterative descent from the initial guess. Similar to classical descent methods such as Newton Conjugate-Gradient and the BFGS algorithms [10], this algorithm requires computing, but not inverting, a Hessian matrix at each descent step. We begin with an overview of QUBOs and how they can be used to approximately solve certain constrained quadratic optimization problems.
Let {0, 1} m denote the set of binary vectors of length m, and let Q be a symmetric m × m matrix. The combinatorial optimization problem is called a quantum unconstrained binary optimization problem, or QUBO, and it is known to be NP-hard [11]. As mentioned before, interest in casting various problems as QUBOs has increased due to the development of annealing devices, which are a class of hardware that use ideas from statistical mechanics to produce approximate solutions to a QUBO. See for example [12] or [13].
To solve a real-variable optimization problem using a QUBO, we require a method of approximating each real variable by b binary variables. This number b will be a parameter referred to as the number of bits.
Let's start with a few concrete examples of the arithmetic involved, beginning with a demonstration of how to multiply two real numbers such as x = −.5 and y = .5 using 2 bits. Form the precision vector p = (−1, .5) with corresponding precision matrix Now let Q be a symmetric matrix, and we shall demonstrate how to compute the quadratic form ðx; y; zÞQ x y z 0 B @ 1 C A using binary variables. As above, set z = p � z b where z b is a binary vector and z is the corresponding real number. We can rewrite the quadratic form as The middle three terms can be written more succinctly as (I 3 � p t )Q(I 3 � p) = Q � P where I 3 is the 3 × 3 identity matrix and � is the tensor product. Now we describe the construction in full generality. Given a precision vector p ¼ is exactly the set We use the sub-scripted x b as a convention to emphasize that x b is a binary vector, i.e. the subscript does not refer to the number of bits. More generally, the n-fold product of C 1,b is the set where I n is the identity matrix. The set C n,b will be referred to as a discretized cube. Let Q be a symmetric n × n matrix, r an n-vector and suppose we want to solve the constrained quadratic programming problem To get an approximate solution to (7) using a QUBO, first replace the unit cube by a discretized unit cube to get the optimization problem.
Setting m = nb, x = (I n � p)x b , and P = p t p, this is equivalent to the following QUBO problem: ¼ argmin ¼ argmin Here Diag(v) refers to the diagonal matrix with entries from v. Going from lines (9) to (10) uses the following identity valid for binary vectors To summarize, given a number of bits b, this procedure approximates the real optimization problem in n variables (7) with the QUBO (11) of size n�b. We conclude by remarking that we are not restricted to the cube [−1, 1] n . If we instead want to optimize over the cube [−δ, δ] n , we need to repeat the same construction with the precision vector δ � p. An immediate question is how much error is introduced by replacing the cube with a discretized cube. Lemma 1. Lipschitz Estimate Let x T be an optimal solution to (7), and let x A be an optimal solution to (8). Then Proof.
p , leading to the Lipschitz estimate Combining (13) with the inequality Lemma 1 makes a trade-off apparent. With more bits, the solution on the discretized cube will better approximate the true solution of (7) but will require solving a larger QUBO.
Indeed, numerical experiments from subsequent sections will show that b = 2 generally requires more iterations than b = 8, indicating that the quality of the approximate solution at each step is worse, although interestingly using b = 2 takes less time overall since solving smaller QUBOs requires less computational effort. It is also worth noting that estimate (12) does not control the actual distance between solutions, ||x A − x T ||. In the special case when Q is positive definite, this distance can be controlled, but it would be interesting to have estimates in greater generality.
The annealers that one works with in practice are never ideal, and so will rarely return the absolute best solution x A but instead a response consisting of many samples x 1 , x 2 , . . ., x l of good solutions with energies E(x i ) � E(x i+1 ). (Here energy of a solution x i refers to the value of the objective function at x i ). An obvious approach is to treat the lowest energy solution x 0 as the best approximation of x T . A subtler approach that can reap great benefits in practice is to take a linear combination of the full response: as an approximation of x T , where β is a parameter. Experiments in later sections were conducted either using the best response x 0 or the full response with β = 100 and performance is compared for several values of n and b. Although (15) is an ad hoc method, the reason for why it works could be due to the energy distribution of the QUBO results. Provided the probability of getting an exited state is Boltzmann distributed (Fermi distribution limit for a single particle at low temperature limits); some responses with considerable low energies could introduce "bit flips" that might correct for the discretization error when added as a linear combination with weights that would decay as an exponential of their energies. A more formal explanation using quantum dynamics for why equation 14 works deserves a further study.
Another approach to get better approximations of x T is to solve a sequence of QUBOs with bias. More precisely, get an initial approximation of x T by following the previous procedure to produce x T 1 . Then modify the QUBO by adding a linear term À ax t T 1 x where α > 0 and find approximate solutions to By Cauchy-Schwartz, (16) the annealer is encouraged to produce solutions in the direction of x T 1 . The annealer produces a new lower-energy solution x T 2 and this process can be repeated until the new solution no longer has lower energy than the previous. Experimental results in later sections contain data with α = 0 and.1. Biasing is most helpful in the initial phase of the algorithm when it is iteratively producing solutions close to previous solutions. In later phases biasing is less useful, as will be evident from the results.

An iterative descent algorithm
Algorithms that solve continuous optimization problems rely on a good initial guess and an iterative descent rule. These tasks can be formulated as QUBO problems when trying to minimize the Rayleigh quotient over the unit sphere.

Obtaining an initial guess.
Let λ 1 � λ 2 � . . . � λ n be the eigenvalues of A. To get an initial approximation of λ 1 , one can ask to solve as an approximation of An immediate problem is that if A is positive definite, the solution to (17) is just x = 0. This can be remedied by replacing A with A − λI n , where λ 2 (λ i , λ i+1 ) for some i. The eigenvectors are unaffected, the eigenvalues can be recovered from the new matrix and the solutions to tend to be long, nonzero vectors very close to the span of eigenvectors which have negative eigenvalues for A − λI n , namely v 1 , . . ., v i . A good initial choice is the average of the eigenvalues l ¼ trðAÞ n . This is a natural choice since it ensures the initial guess to be within the bounds of the eigenspectrum. As the algorithm progresses, λ will decrease towards λ 1 . To converge in fewer iterations, it's better to choose λ close to, but greater than λ 1 , as we will examine later.
These observations lead to the following iterative fixed-point method to produce a good initial guess for the lowest eigenvector. Initially solve (19) with l ¼ trðAÞ n to produce a guess v 1 . Update λ using the Rayleigh quotient l ¼ v t 1 Av 1 kv 1 k 2 and solve (19) again possibly using v 1 as a bias vector to produce a second guess v 2 . Repeat until λ is no longer decreasing.
For small matrices, say 10 × 10, this procedure often suffices to produce the lowest eigenvalue with 2-3 digits of accuracy and the corresponding eigenvector to within a distance of order.1 of the true eigenvector. The descent stage of the algorithm increases the precision to essentially arbitrary order as it will be explained in the next section.

Iterative descent.
Suppose we want to minimize a function f : R m ! R, and let rf and H(f) denote the gradient and Hessian of f, respectively. Starting with an initial guess x 0 , a common strategy is to Taylor expand f around x 0 and choose δ to minimize rf � δ, which is gradient descent, or to minimize rf � d þ d t Hðf Þ 2 d, which includes second order methods such as Newton's method, BFGS, Newton Conjugate-Gradient, etc. Once a better solution x 1 = x 0 + δ has been found, Taylor expand around x 1 again and repeat. The proposed algorithm obtains a good descent direction by using a QUBO to find good approximate solutions to Similar to Newton-CG and BFGS, this method requires computing, but not inverting, the Hessian matrix, and benefits from a line search which possibly increases the size of δ. See [10] for more details on classical optimization algorithms and the benefits of line search. Here the line search step amounts to minimizing a quadratic expression, and so the optimal scaling can be directly computed.
If k th approximate solution x k is closer to the true solution than any point in the discretized cube, one cannot expect minimizing the QUBO to produce a better solution. A key part of the descent phase is enforcing a minimum step size in addition to the line search so that the candidate k + 1 st solution is possibly worse than that k th . If the candidate solution is worse, the algorithm discards the candidate and replaces the discretized unit cube C n,b by a scaled-down discretized cube t � C n,b where t � 1, which amounts to repeating the procedure outlined in section 3 with the precision vector t � p. Once the discretized cube has been scaled down, the algorithm continues running until it needs to scale down the cube further, or exits having achieved the desired accuracy.

The algorithm
With the key ingredients covered, we are in a position to present the algorithm. As a reminder, at each step, the objective function is of the form f(x) = x t (A − λI n )x, whose gradient and Hessian can be calculated as rf = 2(A − λI n )x t and H(f) = 2(A − λI n ) respectively. These formulas are implicitly used in the descent phase of the algorithm. At several stages, the algorithm solves optimization problems of the form argmin r t x þ x t Ax. These are turned in to QUBOs as explained in section 2.1, and annealers are used to minimize the QUBOs that appear, possibly using full responses or biasing. In subsequent section the effects of biasing, full responses and other parameters will be analyzed. Two steps merit a bit more explanation. Replacing δ by δ − hv, δiδ forces v, δ to be orthogonal. Since the Rayleigh quotient needs to be optimized over the sphere, the update direction δ should be tangent to the sphere at v, and the tangent space of the sphere at v is precisely the set of vectors orthogonal to v. Second, the scaling δ is computed as t min = max{−v t Hδ/(δ t Hδ), 1}. The expression −v t Hδ/(δ t Hδ) is the line search step coming from minimizing the quadratic (v + t min δ) t H(v + t min δ). Strictly speaking this quadratic expression only has a minimum when δ t Hδ is positive, and in practice when using this algorithm it almost always is, and if not, set t min = 1. A minimum scaling t min � 1 is enforced so that the candidate update v + t min δ possibly overshoots the exact solution, resulting in a worse estimate of the lowest eigenvector. Overshooting is an indication that the discretized cube is no longer fine enough to produce better solutions, and so the candidate update is discarded and the discretized cube is scaled down. Intuitively the scaling at each step should be about the order of 1 2 bÀ 1 , and the numerical experiments below all use a.1 factor.

Algorithm 1 Controlled Precision QUBO-based Algorithm to Compute Eigenvectors of Symmetric Matrices
Oftentimes in practice one wishes to solve a generalized eigenvalue problem of the form Av = λBv. In the case when A, B are symmetric and B is strictly positive definite, the smallest generalized eigenvalue minimizes the generalized Rayleigh quotient The following small changes solves the generalized eigenvalue problem, again to essentially arbitrary precision. First, instead of initializing λ as trðAÞ n , one can generate a random unit vector w (or use a specified vector) and initialize l ¼ w t Aw w t Bw . Second, replace every Rayleigh quotient with the corresponding generalized Rayleigh quotient. Lastly, instead of updating H as H = A − λI n , update as H = A − λB, as the latter preserves the B-eigenspectrum of A, while the former does not.
We conclude by emphasizing that this algorithm reaches arbitrary precision without increasing the size of the QUBOs, all of which involve n � b binary variables. Additionally, all quadratic problems are of the form r t x + x t (A − λI n )x where A is fixed, implying that the potential non-zero coefficients of the QUBO do not change (examine formula (11)).

Experimental results
The algorithm and its variants are tested on a class of random matrices of varying sizes and on benchmark sparse matrices using D-Wave's simulated annealing (SA) software unless otherwise specified. For each experiment, the algorithm ran until the Rayleigh quotient of the approximate eigenvector was within 10 −8 of the true value. The total annealing time is hence computed as the total amount of time the algorithm spends on performing SA. The SA is the bottle-neck for the algorithm, and we include the time to give the reader a sense of how the run-time varies with the size of the matrix n and the number of bits b, as we will demonstrate in subsequent sections. Adjusting the time for each anneal will not affect the accuracy of the algorithm, as it can reach desired precision with any reasonably good annealer, but may affect the run-time. For a specific architecture, one might be able to tune the anneal time optimally based on n and b, which is not a question we attempt to answer here.

Basic performance
First, we demonstrate the convergence as a function of the number of iterations using example matrices from the TAMU SuiteSparse collection [14]. Fig 1 shows performance on the breast-tissue_10NN matrix, a weighted graph adjacency matrix of size 106 × 106 for 2, 4, 6 and 8 bits using best response and no biasing.
Interestingly using fewer bits gives less time to reach desired accuracy despite requiring more iterations. A log − log regression on the MP matrices (see next section) gives that the anneal time grows like (n � b) 1.57 and the number of iterations grows like n .44 b −.32 so the total time is roughly n 2 b 1.2 . The algorithm works on even larger matrices, as is demonstrated in Fig  2 using the spaceShuttleEntry_1 matrix, a 560 × 560 control matrix. Fig 3 demonstrates the performance for the generalized eigenvalue problems using mesh1em1 as the A matrix and meshe1 as the B matrix, two 48 × 48 matrices from the Suite-Sparse database.
In order to try to get algorithms that run as fast as possible, one might ask if it is possible get the algorithm to work using 1 bit of precision. With the current scheme this cannot be done. However, by reformulating the problem as an Ising instead of QUBO, one can indeed use only one bit precision. Ising problems are of the form argmin x2fÀ 1;1g n h t x þ x t Qx the main distinction from QUBOs being the spin variables ±1. QUBOs or Ising problems are mathematically equivalent, and most annealers are capable of solving either.
Using the Ising formulation, its possible to mimic the same algorithm, which works well on very small matrices. However, for larger matrices, such as for the 106 × 106 weighted adjacency matrix the single-bit version of the algorithm takes longer than using two bits, taking 32.3 seconds and requiring over 400 iterations (compare with Fig 1). An educated guess for why this might happen follows. Since the solutions produced by Ising problems have coordinates that are all non-zero and of the same magnitude, if the algorithm has already produced a solution whose k th coordinate is close to the true value, the added solution from the Ising problem will force that coordinate away from the optimal value. Using two bits is effective because the solutions can have coordinates that are positive, negative, or zero.

Analysis of the parameters
To demonstrate the effect of biasing and full response parameters, the algorithm is tested on small matrices of sizes 3, 10 and 20 with number of bits b = 2, 4, 6 and 8. In the interest of not overwhelming the reader with plots and data, only the data for matrices of size 10 and 20 is displayed. We analyze the error at the end of the initial guess phase, and the average number of iterations each method requires. For each choice of size, bits and parameters, 10 Marchenko-Pasture matrices [15] with parameter λ = .3 are generated and the average errors at the end of the descent phase is recorded. Ideally this initial phase should end with the smallest possible error before beginning the descent phase. Towards this end, taking full responses (Eq (15)) and biases (Eq (16)) can be very beneficial. However, this benefit fades as the sizes of the QUBOs increase as one can see from Figs 4 and 5.
The choice to initialize λ as trðAÞ n is motivated by a desire to produce an initial guess which is close to, but greater than, the true lowest eigenvector. To demonstrate this effect on 10 × 10 and 20 × 20 matrices, we compare performance initializing as trðAÞ n , which is the average of all   the eigenvalues against initializing as the highest Gershgorin bound, which upper bounds the maximum eigenvalue [16]. Choosing an initialization closer to the true eigenvalue often leads to fewer iterations, although the difference is somewhat small and fades as the number of bits increases, as seen in Fig 6.

Gap size analysis
Here we analyze the effect of the spacing between eigenvalues. In particular, the gap |λ 1 − λ 2 | can significantly affect the number of iterations required to reach a given precision. For this experiment, given a gap size g = |λ 1 − λ 2 |, an orthogonal matrix U is chosen at random with respect to the Haar measure using the SciPy implementation of [17]. The algorithm is then analyzed on the matrix U t Diag(0, g, 1, . . ., n − 2)U. As Fig 7 demonstrates, as the gap size decreases the algorithm takes longer to achieve a given accuracy. The exception is when the gap is 0, and the smallest eigenvalue appears with multiplicity. In this case the algorithm actually requires fewer iterations.  In the case when there is degeneracy, that is the gap g is 0, one might want two eigenvectors that span the eigenspace. This can be accomplished by running the algorithm once to get an approximate eigenvector v 1 , replace the matrix A with A þ av 1 v t 1 where α > 0, and run the algorithm again to get the eigenvector v 2 . By the spectral theorem for the symmetric matrix is necessary for numeric purposes. The gap g is never numerically zero, so if the algorithm is run twice on the matrix A even with different randomization, it will often produce the same vector. One can also try to take advantage of the first computation by initializing the approximate eigenvalue to λ n + � in the second run of the algorithm. The data shown below in Fig 9 was collected for a ¼ trðAÞ n À l n and � = 1, and one can see a slight boost in performance in the second run of the algorithm.  The smaller the gap, the more iterations required, especially when the number of bits is small. In the extreme case where the gap is 0 and the lowest eigenvalue appears with multiplicity, the algorithm is actually faster in the sense that fewer iterations are needed for the computed eigenvalue to approximate the true eigenvalue. https://doi.org/10.1371/journal.pone.0267954.g007

Conclusion
We have proposed and tested an algorithm to find eigenvectors of symmetric matrices by minimizing the corresponding Rayleigh quotient with an iterative steepest-descent method. Initial guesses and subsequent descent directions are found by looking for minima over discretized cubes of various sizes, encoded as QUBO problem which is in turn solved with a SA method. The algorithm is able to reach essentially arbitrary precision even for fairly large matrices. We have performed a thorough study of the effect of the different parameters, including, the eigenvalue spacing, initial guesses, number of bits, and the matrix size. We have explored the possibility of using a single bit precision by reformulating the QUBO problem as an Ising problem. Finally, we have introduced two novel approaches to accelerate the convergence such as biasing and using a larger set of solution from the SA step. These two approaches might be applicable to other QUBO based problems. We encourage the reader to test these algorithms on other annealing devices.